Analysis of acoustic models trained on a large-scale Japanese speech database

نویسندگان

  • Tomoko Matsui
  • Masaki Naito
  • Yoshinori Sagisaka
  • Kozo Okuda
  • Satoshi Nakamura
چکیده

This paper investigates the performance of speaker-independent (SI) acoustic hidden-Markov-models (HMMs) trained with a huge Japanese speech database, and discusses the e ciency and task-independency involved. The database consists of read and spontaneous speech uttered by 3,771 speakers. The speech involves wide distributions with respect to region and age to capture the Japanese speech characteristics as best as possible. Recognition experiments using the spontaneous speech show that task-independent acoustic models can be created when training data with a huge number of speakers is available. Age M (%) F (%) Total (%) 10s 271 (19.6) 486 (20.3) 757 (20.1) 20s 827 (59.9) 1253 (52.4) 2080 (55.2) 30s 176 (12.7) 404 (16.9) 580 (15.4) 40s 77 ( 5.6) 182 ( 7.6) 259 ( 6.9) 50s 27 ( 2.0) 62 ( 2.6) 89 ( 2.4) 60s 3 ( 0.2) 3 ( 0.1) 6 ( 0.2) Total 1381 (36.6) 2390 (63.4) 3771 (100.0) Table 1. Number of male and female speakers for each age-group Region M (%) F (%) Total (%) N 123 ( 8.9) 151 ( 6.3) 274 ( 7.3) NE 77 ( 5.6) 122 ( 5.1) 199 ( 5.3) CE 223 (16.1) 445 (18.6) 668 (17.7) C 260 (18.8) 418 (17.5) 678 (18.0) CW 407 (29.5) 663 (27.7) 1070 (28.4) W 96 ( 7.0) 196 ( 8.2) 292 ( 7.7) SE 56 ( 4.1) 98 ( 4.1) 154 ( 4.1) SW 138 (10.0) 293 (12.3) 431 (11.4) Others 1 ( 0.1) 4 ( 0.2) 5 ( 0.1) Total 1381 (36.6) 2390 (63.4) 3771 (100.0) Table 2. Number of male and female speakers for each region

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sharable software repository for Japanese large vocabulary continuous speech recognition

The project of Japanese LVCSR (Large Vocabulary Continuous Speech Recognition) platform is introduced. 1 It is a collaboration of researchers of different academic institutes and intended to develop a sharable software repository of not only databases but also models and programs. The platform consists of a standard recognition engine, Japanese phone models and Japanese statistical language mod...

متن کامل

Persian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods

Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...

متن کامل

Recognition and Verification of Engl for Computer-assisted Languag

We address methods for recognizing English spoken by Japanese students as the basis for our Computer-Assisted Language Learning (CALL) system. For automatic phonemic error detection, pronunciation error prediction is executed for a given orthographic text. To improve reliability, speaker adaptation and segment-input pair-wise verification are applied as pre-processing and post-processing, respe...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Robust spoken language identification using large vocabulary speech recognition

A robust, task independent spoken Language Identi cation (LID) system which uses a Large Vocabulary Continuous Speech Recognition (LVCSR) module for each language to choose the most likely language spoken is described. The acoustic analysis uses mean cepstral removal on mel scale cepstral coe cients to compensate for di erent input channels. The system has been trained on 5 languages: English, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000